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Abstract 

A discrete-time single-user scalar channel with temporally correlated Rayleigh fading is analyzed. 
There is no side information at the transmitter or the receiver. A simple expression is given for the 
capacity per unit energy, in the presence of a peak constraint. The simple formula of Verdu for capacity 
per unit cost is adapted to a channel with memory, and is used in the proof. In addition to bounding the 
capacity of a channel with correlated fading, the result gives some insight into the relationship between 
the correlation in the fading process and the channel capacity. The results are extended to a channel 
with side information, showing that the capacity per unit energy is one nat per Joule, independently of 
the peak power constraint. 

A continuous-time version of the model is also considered. The capacity per unit energy subject 
to a peak constraint (but no bandwidth constraint) is given by an expression similar to that for discrete 
time, and is evaluated for Gauss-Markov and Clarke fading channels. 

Index Terms 

Capacity per unit cost, channel capacity, correlated fading, flat fading. Gauss Markov fading 

I. INTRODUCTION 

Consider communication over a stationary Gaussian channel with Rayleigh flat fading. 
The channel operates in discrete-time, and there is no side information about the channel at 
either the transmitter or the receiver The broad goal is to find or bound the capacity of such a 
channel. The approach taken is to consider the capacity per unit energy. Computation of capacity 
per unit energy is relatively tractable, due to the simple formula of Verdu [1] (also see Gallager 
[2]). The study of capacity per unit energy naturally leads one in the direction of low SNR, 
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since capacity per unit energy is typically achieved at low SNR. However, it is known that 
to achieve capacity or capacity per unit energy at low SNR, the optimal input signal becomes 
increasingly bursty [3-5]. Moreover, such capacity per unit energy becomes the same as for the 
additive Gaussian noise channel, and the correlation function of the fading process does not enter 
into the capacity. This is not wholly satisfactory, both because very large burstiness is often not 
practical, and because one suspects that the correlation function of the fading process is relevant. 

To model the practical infeasibility of using large peak powers, this paper investigates the 
effect of hard-limiting the energy of each input symbol by some value P. A simple expression 
is given for the capacity per unit energy under such a peak constraint. The correlation of the 
fading process enters into the capacity expression found. 

When channel state information is available at the receiver (coherent channel), the ca- 
pacity per unit energy under a peak constraint evaluates to one nat per Joule. Continuous time 
channels are also considered. An analogous peak power constraint is imposed on the input signal. 
The capacity per unit energy expression is similar to that for the discrete-time channel. 

An alternative approach to constraining input signal burstiness is to constrain the fourth 
moments, or kurtosis, of input signals [4-6]. This suggests evaluating the capacity per unit energy 
subject to a fourth moment constraint on the input. We did not pursue the approach because it 
is not clear how to capture the constraint in the capacity per unit cost framework, whereas a 
peak constraint simply restricts the input alphabet. Also, a peak constraint is easy to understand, 
and matches well with popular modulation schemes such as phase modulation. Since a peak 
constraint |X| < \/P on a random variable X implies _E[X'^] < PE[X'^], the bound of Medard 
and Gallager [4] involving fourth moments yields a bound for a peak constraint, as detailed in 
Appendix HI 

The results offer some insight into the effect that correlation in the fading process has on 
the channel capacity. There has been considerable progress on computation of capacity for fading 
channels (see for example Telatar [7], and Marzetta and Hochwald [8]). This paper examines a 
channel with stationary temporally correlated Gaussian fading. The notion of capacity per unit 
energy is especially relevant for channels with low signal to noise ratio. Fading channel capacity 
for high SNR has recently been of interest (see [9] and references therein). 

The material presented in this paper is related to some of the material in [10] and [11]. 
Similarities of this paper to [10] are that both consider the low SNR regime, both have correlated 
fading, and the correlation of the fading is relevant in the limiting analysis. An important 
difference is that [10] assumes the receiver knows the channel. Other differences are that, here, a 
peak constraint is imposed, the wideband spectral efficiency is not considered, and the correlation 
is in time rather than across antennas. Similarities of this paper with [11] are that both impose a 
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peak constraint, but in [11] only the limit of vanishingly small peak constraints is considered, and 
correlated fading random processes are not considered. The papers [4] and [5] are also related. 
They are more general in that doubly- selective fading is considered, but they do not consider a 
peak constraint. 

The organization of this paper is as follows. Preliminary material on capacity per unit 
time and per unit cost of fading channels with memory is presented in Section |ll| The formula 
for capacity per unit energy for Rayleigh fading is presented in Section |III| The results are 
applied in Section |lVl to two specific fading models, namely, the Gauss Markov fading channel, 
and the Clarke fading channel. Proofs of the results are organized into Sections |V| - IVIIIl The 
conclusion is in Section |k1 All capacity computations are in natural units for simplicity. One 
natural unit, nat, is -r^jTyr = 1.4427 bits. 

' ' log(2) 



II. Preliminaries 

Shannon [12] initiated the study of information to cost ratios. For discrete-time memo- 
ryless channels without feedback, Verdu [1] showed that, in the presence of a unique zero-cost 
symbol in the input alphabet, the capacity per unit cost is given by maximizing a ratio of a 
divergence expression to the cost function. The implications of a unique zero-cost input symbol 
were studied by Gallager [2] in the context of reliability functions per unit cost. In this section, 
the theory of capacity per unit cost is adapted to fading channels with memory with the cost 
metric being transmitted energy. Additionally, a peak constraint is imposed on the input alphabet. 

Consider a single-user discrete-time channel without channel state information at either 
transmitter or receiver. The channel includes additive noise and multiplicative noise (flat fading), 
and is specified by 

Y{k) = H{k)X{k) + W{k), keZ (1) 

where X is the input sequence, H is the fading process, W is an additive noise process, and 
Y is the output. The desired bounds on the average and peak transmitted power are denoted by 

Pave ^nd Ppeak- 

An (n, M, z/, P, e) code for this channel consists of M codewords, each of block length 
n, such that each codeword (X^i,. . . ,Xmn), m = 1, . . . , M, satisfies the constraints 

n 

J2\Xmi\' <iy, (2) 

j=l 

max \Xmi\^ <P, (3) 

l<i<n 

and the average (assuming equiprobable messages) probability of decoding the correct message 
is greater than or equal to 1 — e. 
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Two definitions of capacity per unit time for the above channel are now considered. Their 
equivalence is established in Proposition 12.11 for a certain class of channels. Capacity per unit 
energy is then defined and related to the definitions of capacity per unit time, and a version of 
Verdii's formula is given. 

Definition 2.1: Operational capacity: A number R is an e-achievable rate per unit time if 
for every 7 > 0, there exists Uo sufficiently large so that if n > Uo, there exists an (n, M, nPave, Ppeak, e) 
code with logM > n(R — 7). A nonnegative number R is an achievable rate per unit time if 
it is e-achievable for < e < 1. The operational capacity, Cop {Pave, Ppeak), is the maximum of 
the achievable rates per unit time. 

For any n G N and P > 0, let 

D„(P) = {x G C" : \xi\^ < P for 1 <i <n}. (4) 



Definition 2.2: Information theoretic capacity: The mutual information theoretic capacity 
is defined as follows, whenever the indicated limit exists: 

anfoiPave, Ppeak) = Hm SUp -/(Xf; F,"), (5) 

where the supremum is over probability distributions Px^ on I])n{Ppeak) such that 

^Ep^^lWX'^Wl] < Pave. (6) 

Similarly, Cm/o and C_info are defined by: 

C^nfo{Pave, Ppeak) = SUp SUp -/(X^"; 1^") , (7) 

n Pxn ^ 

C,afo{Pave, Ppeak) = lim inf SUp - /(X^ ; Fi") , (8) 

n^oo n 

where the suprema are over probability measures Pxf on ^in{Ppeak) that satisfy 

For memoryless channels, results in information theory imply the equivalence of Definitions 12.11 
and 12.21 This equivalence can be extended to channels with memory under mild conditions. 
In this regard, the following definitions for mixing, weakly mixing and ergodic processes are 
quoted from [13, §5] for ease of reference (also see [14, pp. 70]). 

Let (f)i{zi, Z2, ■ ■ ■ , Zn) {i = I, 2) be bounded measurable functions of an arbitrary number 
of complex variables zi,...,Zn. Let Mt be the operator limt^oo j Xli discrete-time, and 
lim^^oo 7 Jq dt for continuous time. A stationary stochastic process z{t) (t E Z for discrete-time 
processes, and t G M for continuous-time processes^) is said to be: 

'in this paper, continous-time processes are assumed to be mean square continuous. 
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1) Strongly mixing (a.k.a mixing) if, for all choices of 0i, (f)2, and times ti, . . . t„, t*, . . . , t* , 

-E[M4ti), z{Q)] ■ E[^2{4tl), z{t*J] ^ as t ^ oo, (9) 

2) weakly mixing if, for all choices of 0i, 02, and times ti, . . .tn, t^, . . . , t*, 

Mt[iP\t)] = 0, (10) 

3) ergodic if, for all choices of 0i, 02, and times ti, . . . t^, . . . , t*, 

Mt[m] = o. (11) 

In general, strongly mixing implies weakly mixing, and weakly mixing implies ergodicity. 
Suppose a discrete-time or continuous-time random process if is a mean zero, stationary, proper 
complex Gaussian process. Then, H is weakly mixing if and only if its spectral distribution 
function {Fh^(c<j) : — vr < u; < vr} is continuous, or, equivalently, if and only if Mt[\RH{t)\'^] = 0, 
where Rh is the autocorrelation function of H. Also, H is mixing if and only if lim^^oo Rnit) = 
[13, Theorem 9]. It follows from the Riemann-Lebesgue theorem that H is mixing if is 
absolutely continuous. Furthermore, H is ergodic if and only if it is weakly mixing. To see this, 
it suffices to show that H is not ergodic if Fh is not continuous. Suppose Fh has a discontinuity 
at, say, A. Let Ux = Mt[H{t) e^*^*]. Clearly, U\ is zero-mean proper complex Gaussian. Also, 
E[\Ux\'^] = FniX + 0) - FniX - 0) [13, Theorem 3]. Note that \Ux\ is invariant to time-shifts 
of the process H. Since \Ux\ is a non-degenerate shift-invariant function of H, it follows that 
H is not an ergodic process [14, 5.2]. 

The following proposition is derived from notions surrounding information stability (see 
[14, 15]) and the Shannon-McMillan-Breiman theorem for finite alphabet ergodic sources. A 
simple proof is given in Section IV-AI 

Proposition 2.1: If H and W are stationary weakly mixing processes, and if H, W and 
X axe mutually independent, then for every Pave, Ppeak > 0, C info (Pave, Ppeak) is well defined 

infoi^Pavei Ppeak) d.info^Po'i^fi^ Ppf^o.^Y), and Cinfo(yPavej Ppeak) C gpi^Pave^ Ppeak) • 

Since ergodicity is equivalent to weakly mixing for Gaussian processes, the above proposition 
then implies that the two definitions of capacity coincide for the channel modeled m. ^ ii H 
and W are stationary ergodic Gaussian processes and H, W and X are mutually independent. 

Following [1], the capacity per unit energy is defined along the lines of the operational 
definition of capacity per unit time, Cop(), as follows. 

Definition 2.3: Given < e < 1, a nonnegative number R is an e-achievable rate per 
unit energy with peak constraint Ppeak if for every 7 > 0, there exists Uo large enough such that 
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if z/ > z/q, then an (n, M, u, Ppeak, e) code can be found with log M > z/(i? — 7). A nonnegative 
number R is an achievable rate per unit energy if it is e-achievable for all < e < 1. Finally, 
the capacity Cp{Ppeak) P^r unit energy is the maximum achievable rate per unit energy. 

The subscript p denotes the fact that a peak constraint is imposed. It is clear from the definitions 
that, for any given < e < 1, if i? is an e-achievable rate per unit time, then R/Pavg is an 
e-achievable rate per unit energy. It follows that Cp{Ppeak) can be used to bound from above 
the capacity per unit time, Cop{Pave, Ppeak), for a specified peak constraint Ppeak and average 
constraint Pave, as follows. 

Copi^Pavej Ppeak) — Pave Cpi^Ppeak)- (12) 

The following proposition and its proof are similar with minor differences to [1, Theorem 2], 
given for memory less sources. 

Proposition 2.2: Suppose Cop{Pave, Ppeak) = Cinfo{Pave, Ppeak) for < Pave < Ppeak 

(see Proposition 12 . 1 1 for sufficient conditions). Then capacity per unit energy for a peak constraint 
Ppeak is given by 

CpiPpeak) = sup ^"^(-P-'-Pp-^) (13) 



Pave>0 Pr 



ave 



/(X"; K") 

= sup sup pn,y„,,2i (14) 

where the last supremum is over probability distributions on D„(Ppeafc)- Furthermore, 

n (-n \ V Dil>Y\x\\PY\Q) 

Cp{Ppeak) = lim sup • (15) 

"-^'»XGD„(Pp,,fe) ||-^||2 



The proof is given in Section I^B] For Ppeak fixed, Cop{Pave, Ppeak) is a concave non-decreasing 
function of Pave- This follows from a simple time-sharing argument. It follows that 

C op\Pavej Ppeak) i. dop\Pavej Ppeak) 

sup — - — = lim — - — . (16) 

Pave>0 ^ave Pave^O ^ave 

So, the supremum in (fT^ can be replaced by a limit. 

If H and W are i.i.d. random processes so that the channel is memoryless, then the 
suprema over n in ([T4b and (fTSb are achieved hy n = 1. Proposition 12.21 then becomes a special 
case of Verdu's results [1], which apply to memoryless channels with general alphabets and 
general cost functions. 

Equation ( [T51) . which is analogous to [1, Theorem 2], is especially useful because it 
involves a supremum over B)niPpeak) rather than over probability distributions on 'B„{Ppeak)- 
This is an important benefit of considering capacity per unit cost when there is a zero cost input. 
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It is noted that the natural extension of the corollary following [1, Theorem 2] also applies here. 
The proof is identical: 

Corollary 2.1: Suppose Cop{Pave, Ppeak) = Cinfo{Pave, Ppeak) for < Pave < Ppeak 

(see Proposition 12.11 for sufficient conditions). Rate R is achievable per unit energy with peak 
constraint Ppeak if and only if for every < e < 1 and 7 > 0, there exist s > and uq, such that 
if u > uq, then an (n, M, Ppeak, e) code can be found with logM > iy{R — 7) and n < su. 

For the remainder of this paper, the fading process H is assumed to be stationary and 
ergodic. Both H and the additive noise W are modeled as zero mean proper complex Gaussian 
processes, and without loss of generality, are normalized to have unit variance. Further, W is 
assumed to be a white noise process. The conditions of Proposition 12. II are satisfied, and so the 
two definitions of capacity per unit time are equivalent. Henceforth, the capacity per unit time 
is denoted by C {Pave, Ppeak)- Also, for brevity, in the remainder of the paper, a peak power 
constraint is often denoted by P instead of Ppeak- 



III. Main results 

A. Discrete-time channels 

The main result of the paper is the following. 

Proposition 3.1: Let S{uj) denote the density of the absolutely continuous component of 
the power spectral measure of H. The capacity per unit energy for a finite peak constraint P is 
given by 

Cp{P) = 1-^, (IV) 

/TT J 
\og{l + PS{uj))—. (18) 

Moreover, roughly speaking, the capacity per unit energy Cp{P) can be asymptotically achieved 
using codes with the following structure. Each codeword is ON-OFF with ON value -^P- The 
vast majority of codeword symbols are OFF, with infrequent long bursts of ON symbols. See 
the end of Section IVI-BI for a more precise explanation. 

Suppose that in the above channel model, channel side information (CSI) is available at 
the receiver. The fading process is assumed to be known causally at the receiver; i.e. at time step 
k, the receiver knows {H{n) : n < k}. For this channel, a (n, M, u, P, e) code, achievable rates 
and the capacity per unit energy for peak constraint P, denoted by C'p°^{P), are respectively 
defined in a similar manner as for the same channel without CSI. 

Proposition 3.2: For P > 0, C^°^iP) = 1- 
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There is an intuitively pleasing interpretation of Proposition 13.11 Note that Cp{P) = 
Cp°^{P) — -pI{P). The term -pI{P) can be interpreted as the penalty for not knowing the 
channel at the receiver. The integral I{P) is the information rate between the fading channel 
process and the output when the signal is deterministic and identically -\/P (see Section IVI-CI ). 
When ON-OFF signaling is used with ON value -\/P and long ON times, the receiver gets 
information about the fading channel at rate I{P) during the ON periods, which thus subtracts 
from the information that it can learn about the input. Similar observations have been previously 
made in different contexts [16, 17]. 

The definition of Cp{P) still makes sense if P = oo, and Cp(oo) is the capacity per unit 
energy with no peak constraint. It is well known that Cp(oo) = 1 (see [5, p. 816], [17-19]). Note 
that, by (El) and dH), as P ^ oo, Cp(P) ^ 1 = Cp(oo). By their definitions, both C^°^iP) and 
Cp(oo) are upper bounds for Cp{P). The bounds happen to be equal: Cp°^{P) = Cp(oo) = 1. 

Another upper bound on Cp{P) is Up{P), defined by 

The fact Cp(P) < f/p(P) follows easily from ^\T^, (HHJ) and the inequality log(l + x) > x - ^ 
for a; > 0. Also, Cp(P) Up{P) as P ^ 0. It is shown in Appendix IJ that the bound f/p(P) 
is also obtained by applying an inequality of Medard and Gallager [4] . 



B. Extension to continuous-time channels 

The model for continuous time is the following. Let {H{t) : — oo < t < oo) be a 
continuous-time stationary ergodic proper complex Gaussian process such that £'[|i7(t)p] = 1. 
A codeword for the channel is a deterministic signal X = (X(t) : < t < T), where T is the 
duration of the signal. The observed signal is given by 

Y{t) = H{t)X{t) + Wit), 0<t<T (20) 

where W{t) is a complex proper Gaussian white noise process with ii^[iy(s)iy(t)] = 5{s — t). 
The mathematical interpretation of this, because of the white noise, is that the integral process 
V = {Vt = /q Y{s)ds : < t < T) is observed [5]. The mathematical model for the observation 
process is then 

V{t) = [ H{s)X{s)ds + ri{t) 0<t<T, (21) 
Jo 

where t] is a standard proper complex Wiener process with autocorrelation function E[r]{s)r](t)] = 
min{s, t}. The process V takes values in the space of continuous functions on [0,T] with 
V{0) = 0. 
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A (T, M, u, P, e) code for the continuous-time channel is defined analogously to an 
(n, M, z/, P, e) code for the discrete-time channels, with the block length n replaced by the 
code duration T, and the constraints © and © replaced by 

/ \X{t)\'^dt < u, (22) 
Jo 

sup |X(t)p < P. (23) 

0<t<T 

The codewords are required to be Borel measurable functions of t, but otherwise, no bandwidth 
restriction is imposed. Achievable rates and the capacity per unit energy for peak constraint P, 
denoted Cp{P), are defined as for the discrete-time channel. 

Proposition 3.3: Let 5*^(0;) denote the density of the absolutely continuous component 
of the power spectral measure of H. Then 

C,{P) = 1-^, (24) 

where /(P) = T log(l + P5^,M)^. (25) 
The proof is given in Section IVIIII 

The following upper bound Up{P) on Cp{P) is constructed on the lines of the upper 
bound on the discrete-time capacity per unit energy defined in (fT9l) . 

Similar to the discrete-time case, Cp{P) Up{P) as P ^ 0. 



S^Lu) — (26) 
-oo 27r 



IV. ILLUSTRATIVE EXAMPLES 



Using Propositions 13.11 and 13.31 the capacity per unit energy with a peak constraint is 
obtained in closed form for two specific models of the channel fading process. The channel 
models considered are Gauss-Markov fading and Clarke's fading. Finally, the capacity per unit 
energy with peak constraint is evaluated for a block fading channel with constant fading within 
blocks and independent fading across blocks. 

A. Gauss-Markov Fading 

1) Discrete-time channel: Consider the channel modeled in ([T]). Let the fading process 
H be Gauss-Markov with autocorrelation function Rnik) = p'^' for some p with < p < L 

Corollary 4.1: The capacity per unit energy for peak constraint P, for the Gauss-Markov 
fading channel, is given by 

C,(P) = 1 - M£±) ,27) 
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where z+ is the larger root of the quadratic equation z'^ — (1 + P + p'^(l — P))z + = 0. The 
bound Up{P) simplifies to the following: 

For the proof of the above corollary, see Appendix HH 

The upper bounds Cp°^{P) and Up{P) are compared to Cp{P) as a function of peak power 
P in Figure □ for p = 0.9 and p = 0.999. The figures illustrate the facts that Cp{P) ^ C^°^{P) 
in the limit as P ^ oo, i.e., when the peak power constraint is relaxed, and that Cp{P) ^ Up{P) 
as P — > 0. In Figure El Cp{P) and Up{P) are plotted as functions of the p, for various values 
of peak constraint P. 

It is common in some applications to express the peak power constraint as a multiple 
of the average power constraint. Consider such a relation, where the peak-to-average ratio is 
constrained by a constant [3, so 

P f^Pavg- 

From (fT^ and (fT91) . we get the following bounds on the channel capacity per unit time. To get 
the final expressions in (QUI) and (BTT ). P is substituted by l3Pavg in the expressions for Cp{P) 
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and Up{P) in ^ and 

C {^f3Pavgi Pavg) ^ C*p i^l^Pavg) ' Pavg 1 ' Pavg (29) 

log (2^) 

C (y(3Pavgi Pavg) ^ Cpi^PPavg) ' Pavg Pavg ~p (30) 

C{(3Pavgi Pavg) ^ Up(PPavg) " Pavg = ^2 ' _ PPavg (31) 

Here, is the larger root of the quadratic equation z'^ — {1 + /3 ■ Pavg + p'^i'i-— /3 ■ Pavg)) z + p'^ = 0. 

The bounds are plotted for various values of p and /5 in Figures |3l - IH The average 
power Pavg (x axis) is in log scale. All the capacity bounds converge at low power to zero. The 
fourthegy bound Up{P) tends to increase faster than Cp{P) for higher (3, i.e., more relaxed peak 
to average ratio. A similar behavior is observed when the correlation coefficient p, and hence 
coherence time, is increased. Note that the case when (3 = 1 corresponds to having only a peak 
power constraint, and no average power constraint. 

2) Continuous-time channel: Consider the channel modeled in (l20b . As in the discrete- 
time case considered above, let the fading process iJ be a Gauss-Markov process with autocor- 
relation Rnit) = pl*l, where the parameter p satisfies < p < 1. The power spectral density 
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Fig. 3. Capacity bounds for the discrete-time channel, 13 = 1 
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Fig. 4. Capacity bounds for the discrete-time channel, /? = 5 
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{S'(co') : < Co" < 2tt} is given by 

^ -21ogp 
uj^ + (logp)^ 

The capacity per unit energy with peak constraint P is obtained by using the above expression 
for the power spectral density in (l24ll and simplifying using the following standard integral [20, 
Section 4.22, p. 525]: 



It follows that 



log ( -p— j- — ^ ] dx = {a — b)n, a > 0,b > 



Cp{P) = 1 - (v/(logp)2-2Plogp + logp) (33) 
The upper bound Up{P) in (EU is evaluated using Parseval's theorem. 

UpiP) = (34) 
-2 logp 

In Figure 121 the capacity per unit energy with peak constraint Cp{P) and the upper bound Up{P) 
are plotted and compared as functions of peak power P for various values of p. In Figure |6l 
Cp{P) and Up{P) are plotted as functions of the p, for various values of peak constraint P. 



B. Clarke's Fading 

Fast fading manifests itself as rapid variations of the received signal envelope as the 
mobile receiver moves through a field of local scatterers (in a mobile radio scenario). Clarke's 
fading process [21, Chapter 2 p. 41] is a continuous -time proper complex Gaussian process with 
power spectral density given by 

I/I < h 



S{2'Kf) = l ""^^ ^^~UIU? (35) 
I elsewhere 

where fm is the maximum Doppler frequency shift and is directly proportional to the vehicle 
speed. The model is based on the assumption of isotropic local scattering in two dimensions. 
Consider a continuous-time channel modeled in (l20b . with the fading process following the 
Clarke's fading model. 

Corollary 4.2: For a time- selective fading process H with the power spectral density 
given by (l35l) . the capacity per unit energy for peak constraint P is given by Cp{P) = g{P) — 
p log( j) + 1 - |, where g{P) is given by 



9{P) 



v/r^75^(f-arctan^) P>1 
^P-^ - 1 (^/m(arctan ^^)) P < 1 




Fig. 



6. Capacity per unit energy and upper bounds as a function of p 
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Here, Im{z) is the imaginary part of the complex number z. 

Corollary 14.21 is obtained by evaluating the integral in (l25t . Note that, for this channel, the 
integral in (fT9t diverges, so that Up{P) = +00. 

C. Block Fading 

Suppose the channel fading process H, in either discrete-time (U]) or continuous-time 
(l20b . is replaced by a block fading process with the same marginal distribution, but which is 
constant within each block (of length T) and independent across blocks. 

Corollary 4.3: The capacity per unit energy with peak constraint P of a block fading 
channel (discrete-time or continuous-time), block length T, is given by C^f (P, T) = i — ^°s(^"j^^^) _ 

See Appendix Hill for the proof. Note that for P fixed, limT^^o Cp{P,T) = 1. 

The capacity per unit time of the above channel with peak constraint P and average 
power constraint Pavg, denoted by Cp'^{P,Pavg), can be bounded from above using Corollary 
14.31 ([T2b and the inequality log(l + x) > x — for a; > as follows. 

C^'''{P,Pavg)<'^Pavg-P (36) 

In the presence of a peak-to-average ratio constraint, /? say, the bound on the capacity is quadratic 
in Pavg for small values of Pavg ■ Similarly, the mutual information in a Rayleigh fading channel 
(MIMO setting) is shown in [6] to be quadratic in Pavg, as Pavg — > 0. 

V. Proofs of Propositions in Section HIl 

A. Proof of Proposition I2.il 

It is first proved that Qi^fo ^ ^op- Given e > 0, for all large n there exists an 
(n, M, nPave, Ppeak, c) codc with MogM > Cop — e. Letting X" represent a random codeword, 
with all M possibilities having equal probability, Fano's inequality implies that /(X"; F") > 
(1 - e)logM - log 2, so that /(Xf;Fi") > (1 - e)n{Cop - e) - log 2. Therefore, C^^^^ > 
(1 — e){Cop — e). Since e is arbitrary, the desired conclusion, > Cop, follows. It remains 

to prove the reverse inequality. 

Consider the definition of Cinfo - For Ppeak fixed, using a simple time-sharing argument, it 
can be shown that Cinfo{Pave, Ppeak) is a concave non-decreasing function of Pave- Consequently, 
given e > 0, there exists an n > 1, 5 > 0, and distribution Pxj» on D„(Ppeafc) @ such that 

Y[^) > Cinfo - e with E[\ \X^\ |2] < nPave - 26. 
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Since the mutual information between arbitrary random variables is the supremum of the 
mutual information between quantized versions of the random variables [14, 2.1], there exist 
vector quantizers q : D„(Ppeafc) ^ and r : C" ^ B, where A and B are finite sets, such 
that i/(g(Xf);r(Fi")) > ^/(Xf; 1^") - e > Cinfo - 2e. By enlarging A if necessary, it can be 
assumed that for each a, q^^{a) is a subset of one of the energy shells {x G D„(Ppeafc) : Ka5 < 
\\A\2 — i^a + l)*^} for some integer Ka- Therefore, with I defined on A by 1(a) = sup{||x||2 : 
X E g^^(a)}, it follows that ||x||2 > 1(a) — 5 for all x E q~^(a), for all a E A. Hence, 

E[l(q(X^))] < E[\m\l] + 6 < nPa^e - S. (37) 

For each a E A, let 7^ be the conditional probability measure of X" given that q(X^) = a. 
Then 7 = (7^ : a E A) is the transition kernel for a memoryless channel with input alphabet 
A and output alphabet I])n(Ppeak)- Define a new channel z/, with input alphabet A and output 
alphabet B, as the concatenation of three channels: the memoryless channel with transition kernel 
7, followed by the original fading channel, followed by the deterministic channel given by the 
quantizer r. The idea of the remainder of the proof is that codes for channel D correspond to 
random codes for the original channel. 

Let (Xk : k E Tj) consist of independent random variables in A, each with the distribution 
of Let (% ■ k E Z) be the corresponding output of i> in B^. Note that (Xi,Yi) has 

the same distribution as (q(X^),r (¥{")), so that I(XuYi) = I(q(X^);r(Y^)) > n(C,nfo - 2e). 
Since the input sequence X is i.i.d. and the channel is stationary, 

^I(Xt, Yi) > I{Xi; Fi) > n(C.,nfo - 2e). 

Letting (X^ : k E Z) he the input to a discrete memoryless channel with transition kernel 
7 produces a process X with independent length n blocks, which can be arranged to form an 
i.i.d. vector process (X^j^_-^^^^-^^ : k E Z). Similarly, the processes H and W in the fading channel 
model can be arranged into blocks to yield vector processes. 

It is now shown that for any discrete-time weakly mixing process U, the corresponding 
vector process obtained by arranging U into blocks of length n is also weakly mixing. For any 
choice of bounded measurable functions $j (i = 1, 2) on the vector process (U^^_^-^^_^_^ : k E Z), 
there exist corresponding functions (pi defined on the process U. Let ^/'(t) be defined on (pi as 
given in and ^(t) be defined analogously on $j. Clearly, '^(t) = ijj(nt) and 

yE^'W = yE^'M (38) 

1 1 

^ nt 

< n-Y,^'(t). (39) 
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It follows that 

Mt[^\t)] < nMt[ilj^{t)] (40) 
= 0. (41) 

where (1411) follows from the weakly mixing property of U dTOb . Consequently, the vector process 
obtained from U is also weakly mixing. Since H and W are weakly mixing, it follows that the 
corresponding vector processes are also weakly mixing. 

Remark: It should be noted that, when an ergodic discrete-time process is arranged in 
blocks to form a vector process, the resulting vector process is not necessarily ergodic. For 
example, consider the following process U. Let Uq be or 1 with equal probability. Let Ui = 
I — Uq, and Uk = Uk mod 2- The process U is ergodic. However, when U is arranged into a vector 
process of length n = 2 (or any other even n), the vector process is not ergodic. In fact, there 
exist ergodic processes such that the derived vector processes of length n are not ergodic for any 
n > 1. For example, let f/^ = X]ieP^i*V2\ where the processes are independent, and for 
each process V^^, {yi'\ . . . , V^) is chosen to be one of the i patterns (0 . . . 0, 1), (0 . . . 0, 1, 0), 
. . . (1, . . . 0) with probability and vj:!^^ = V^''^^^^ ■. Here, P is the set of primes. It can be 
shown that U is ergodic. For any n G P and k E N, when V^"'^ is arranged into a vector process 
of length kn, the vector process is not ergodic. Since any m G N has factors in P, it follows 
that the vector process of length m obtained from U is not ergodic either. 

Arranging the output process Y of the fading channel into a length-n vector process, it 
is clear that the A;*^ element of this process Y^'j^-^-^^j^-^ depends only on the k*'^ elements of the 
vector processes of X, H and W. Further, the output is a function of Y^i,"_-^-^^_^-^. Therefore, 
the process (X^, X^^_^^^^^, >'(t-i)n+i' ^ : k e Z) is a weakly mixing 

process. So, X and Y are jointly weakly mixing and hence jointly ergodic. 

Thus, the following limit exists: I{X,Y) = Um^^oo iH-^i^'^i)^ ^^'^ the limit satisfies 
/(X, Y) > n{Cinfo — 2e). Furthermore, the asymptotic equipartition property (AFP, or Shannon- 
McMillan-Brieman theorem) for finite alphabet sources implies that 

lim PlljZkiXtA') -HX;Y)\ > e] = 0, (42) 

where ik{Xi] Y^) is the logarithm of the Radon-Nikodym density of the distribution of (X^ , Yi) 
relative to the product of its marginal distributions. 

Since Xi has the same distribution as ^(X"), (BTl) implies that i?[/(Xi)] < nPave — ^• 
Thus, by the law of large numbers, limfc^oo P[\ Y^^j=i K^j) ^ ^Pave] = 0. 

Combining the facts from the preceding two paragraphs yields that, for k sufficiently 
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large, , Y^^) G G^] > 1 - e/2, where Gk is the following subset of x B^: 

k 

Gk = {^^^(xt; yt) > niG.nfo - 3e)} n ^ /(%) < nPa.e}- (43) 

Therefore, by a variation of Feinstein's lemma, modified to take into account the average power 
constraint (see below) for sufficiently large k there exists a {k, M, e) code ^ for the channel 
iy such that logM > kn{Cinfo — 4e) and each codeword satisfies the constraint \ Yl!j=i K^j) ^ 
n P 

For any message value j with 1 < j < M, passing the jth codeword a of CA,k through 
the channel with transition kernel 7 generates a random codeword in D„(Ppeafe)'^, which we can 
also view as a random codeword x in C"^. Since ||x||2 < Yl'j=iK'^j) — iT-kPave, the random 
codeword x satisfies the peak power constraint for the original channel, with probability one. 
Also, the average error probability for the random codeword is equal to the error probability for 
the codeword a, which is at most e. Since the best case error probability is no larger than the 
average, there exists a deterministic choice of codeword x, also satisfying the average power 
constraint and having error probability less than or equal to e. Making such a selection for each 
codeword in CA,k yields an {nk, M, nkPave, Ppeak, ^) code for the original fading channel with 
log(M) > nk{Ginfo-4:e). Since e > is arbitrary, C op{Pave, Ppeak) > CinfoiPave, Ppeak), as was 
to be proved. 

It remains to give the modification of Feinstein's lemma used in the proof. The lemma 
is stated now using the notation of [15, §12.2]. The lemma can be used above by taking A, Ao, 
and a in the lemma equal to A'', A'' U Yl'j=i K^j) ^ i^Pave}, and nk{Ginfo — 3e), respectively. 
The code to be produced is to have symbols from a measurable subset Ao of A. 

Lemma 5.1 (Modified Feinstein's lemma): Given an integer M and a > there exist 
Xj G Ao]j = 1, . . . , M and a measurable partition JF = {Tj;j = 1, . . . , M} of B such that 

^(^"jlxj) < Me-" + PxY{{i < a} U [A^ x B)). 

The proof of Lemma I5TT] is the same as the proof given in [15] with the set G in [15] replaced 
by the set G = {{x,y) : i{x,y) > a, x G Ao] and with e = Me~°- + Pxy{G'^)- The proof of 
Proposition 12.11 is complete. 

B. Proof of Proposition \2.2\ 

Proof: For brevity, let a = supp^^^>o ^"'^'^pllf'""''^ " ^^^^ ^° P^°^^ Cp{Ppeak) = a. 
The proof that Gp{Ppeak) > a is identical to the analogous proof of [1, Theorem 2]. 

To prove the converse, let e > 0. By the definition of Gp{Ppeak), for any u sufficiently 
large there exists an (n, M, u, Ppeak, e) code such that logM > i'{Gp{Ppeak) — e). Let X" be a 
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random vector that is uniformly distributed over the set of M codewords. By Fano's inequality, 

/(X- Fi") > (1 - e) log M - log 2. Setting P,,e = ^, 

- C info {Pave, Ppeak) > " -^(-^"S ^l") (44) 



Pave 1^ 

> (l-e)(Cp(Ppeafc)-e)-^. (45) 

Using the assumption that C,nfo{Pave, Ppeak) = Cop{Pave, Ppeak) yields a > (1 - e){Cp{Ppeak) - 

e) — Since e can be taken arbitrarily small and v can be taken arbitrarily large, a > Cp{Ppeak)- 
This proves dHl). Noting that Cop{Pave, Ppeak) = CinfdPave, Ppeak) by assumption, and using 
Definition 1221 it is clear that follows from ([T3t . 

Consider a time- varying fading channel modeled in discrete time as given in ([T]). It is 
useful to consider for theoretical purposes a channel that is available for independent blocks of 
duration n. The fading process is time-varying within each block. However, the fading across 
distinct blocks is independent. Specifically, let H denote a fading process such that the blocks 
of length n, {H{1 + kn), . . . , H{n + kn)), indexed in k E Z, are independent, with each having 
the same probability distribution as {H{1), H{2), . . . , H(n)). Let Cp^n{P) denote the capacity 
per unit energy of the channel with fading process H. 

From ([T3t . 

Cp{P) = supCp,„(P) (46) 

n>l 

The proof of Proposition 12.21 is completed as follows. By its definition, Cp.n{P) is clearly 
monotone nondecreasing in n, so that sup„Cp_„ = lim„^oo Cp,n(-P)- Thus, the time- varying 
flat fading channel is reduced to a block fading channel with independently fading blocks. The 
theory of memory less channels in [1] can be applied to the block fading channel, yielding, for 
n fixed, 

Cp,n{P) = sup — ^ (47) 

Equation dTSt follows from (l46t and (l47t . and the proof is complete. ■ 



VL PROOF OF PROPOSITION O 

The proof of Proposition 13.11 is organized as follows. The capacity per unit energy is 
expressed in ([131) as the supremum of a scaled divergence expression. To evaluate the supremum, 
it is enough to consider codes with only one vector X in the input alphabet, in addition to the 
all zero input vector. In Section IVI-Al ON-OFF signaling is introduced. It is shown that the 
supremum is unchanged if X is restricted to be an ON-OFF signal; i.e. Xi G {0, VP} for each 
i. In Section IVI-BL the optimal choice of input vector X is further characterized, and temporal 
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ON-OFF signaling is introduced. In Section IVI-Cl a well-known identity for the prediction error 
of a stationary Gaussian process is reviewed and applied to conclude the proof of Proposition 



A. Reduction to ON-OFF Signaling 

It is shown in this section that the supremum in dTSl is unchanged if X is restricted to 
satisfy Xj G {0, y/P} for each i. Equivalently, in every timeslot, the input symbol is either 
(OFF) or (ON). We refer to this as ON-OFF signaling. 



The conditional probability density [8] of the output n x 1 vector Y, given the input 
vector X, is 

exp(-tr(4 + XSXt)-W) 
^-l-(^) = .».det(4 + XSXt) ^''^ 

where X denotes the n x n diagonal matrix with diagonal entries given by X, and S is 
the covariance matrix of the random vector (-f^(l), . . . ,H{n))'^. The divergence expression is 
simplified by integrating out the ratio of the probability density functions. 

= j PY\xiy) {-tr(/„ + xsxt)-irFt 

+tr{YY^) - logdet (/„ + XSX"^)} dY 
= -tr{In + Xi:X^)-\ln + XSXt) + tr{In + XSXt) 

-logdet (/„ + XSXt) 
= tr(XSX"f) - log det (/„ + XSX"^) 

Since the correlation matrix S of the fading process is normalized, it has all ones on the main 
diagonal. Thus tr(XSXt) = ^^^^ |Xi|2, so 



n— >oo 



XeD„(P) I|-^ll2 



logdet(/ + XSXt) 
= 1- hm mf — . (49) 

n^ooXeD„(P) ||-^||2 

Here (Xi, . . . , X„) takes values over deterministic complex n x 1 vectors with |Xjp < P. 
Consider X = i?exp(j0), where i? is a nonnegative diagonal matrix, and 6 is diagonal with 
elements 0^ G [0, 27r]. Using det(/ + AB) = det(/ + BA) for any A, B, we get 

det(/ + XSXt) = det(/ + X"fXE) 
= det(/ + i?2E) 



21 



Hence we can restrict the search for the optimal choice of the matrix X (and hence of the input 
vector signal X) to real nonnegative vectors. So, logdet(/„ + XSX^) = logdet(J„ + X^S). 

Fix an index i with 1 < i < n. Note that det(/„ + X^S) is linear in Xf. Setting Xf = u, 
the expression to be minimized in ( R^ can be written as a function of u as 

logdet(/ +X^E)^log(a + H <p 

c + u ' - - 

for some non-negative a, b and c. Since S is positive semidefinite, all the eigenvalues of /„ + 

X^E are greater than or equal to 1. Thus both the numerator and the denominator of (l50b are 

nonnegative. The second derivative of f{u) is given by 

2 ^2 
fin) = ■ nu) - 



c + u (c + li) (a + 6m)2 

So, f{u) has no minima and at the most one maximum in the interval [0, P]. Since u is 
constrained to be chosen from the set [0,P], f{u) (and hence the function of interest) reaches 
its minimum value only when u is either or P. This narrows down the search for the optimal 
value of Xf from the interval [0,P] to the set {0, P} for all i E {!,... ,n}. Restricting our 
attention to values of X with X^ G {0, P}, we get the following expression for capacity per unit 
energy: 

C,(P) = l-inf inf l0Sdet(4 + A-tA-E) ^^^^ 



Consider the expression 



{0, V^} valued signals ll"^ll2 
with support in {1, . . . , n} 

logdet(4 + XtXS) 

mi 



Here, n is the block length, while X is the input signal vector, with Xj G {0, VP}. Having a 
certain block length and an input signal vector has the same effect on the above expression as 
having a greater block length and extending the input signal vector by appending the required 
number of zeros. So, the expression does not depend on the block length n, as long as n is large 
enough to support the input signal vector X. 

Since the block length n does not play an active role in the search for the optimal input 
signal, (BTl) becomes 

r(P\ ^ -f -f logdet(/ + XtXS) 

C,(P) = l-inf mf — . (52) 

{0, vP} valued signals 
with energy kP 



From here onwards, it is implicitly assumed that, for any choice of input signal X, the 
corresponding block length n is chosen large enough to accommodate X. 
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B. Optimality of Temporal ON-OFF Signaling 

We use the conventional set notation of denoting the intersection of sets Ar\B hy AB 
and A's complement by A''. 

Consider the random process Z 

Zk = VpH, + W^fc V A; (53) 



In any timeslot k, if the input signal for the channel dU) is yP, then the corresponding output 
signal is given by (l53t . Otherwise, the output signal is just the white Gaussian noise term Wk- 
A {0, -\/P}-valued signal X with finite energy can be expressed as X = y/PI^, where A is 
the support set of X defined hy A = {i : Xi ^ 0}, and Ia denotes the indicator function of A. 
Thus, A is the set of ON times of signal X, and \A\ is the number of ON times of X. 

Definition 6.1: Given a finite subset A of Z+, let 

, ^, / log [det(/ + Pdtag{lAm if ^ ^ 
\ ifA = 

Further, for any two finite sets A,B C Z, define a(A\B) by 

a{A\B) = a{AUB) - a{B) 

It is easy to see that for A ^ 

a{A) = h{Zi ■.ieA)-\A\ log(7re) 

where h(.) is the differential entropy of the specified random variables. Note that the term 
— |A|log(7re) in the definition of a is linear in \A\. Also, a{.\.) is related to the conditional 
differential entropies of the random variables corresponding to the sets involved. Specifically, 

a{A\B) = h{Zi : i G AB^'lZj : j e B) - \AB'\ log(7re) 

We are interested in characterizing the optimal signaling scheme that would achieve the infima 
in (l52b . Since the input signal is either ^/P or 0, the expression inside the infima in (l52b can be 
simplified to pq^, where A is the set of indices of timeslots where the input signal is nonzero. 
Thus, the expression for capacity per unit energy reduces from (l52b to 

Cp{P) = l- inf (54) 

Lemma 6.1: The functional a has the following properties. 

1) «(0) =0 

2) If C c D, then a{C) < a{D). Consequently, a{A) > for each A ^ 0. 

3) Two alternating capacity property: a{A U B) + a{AB) < a{A) + a{B) 
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4) a{B) = a{B + k) for each keZ 

Proof: The first property follows from the definition. From the definition of a{.\.), a{D) = 
a{DC''\C) + a{C), so the second property is proved if a(DC"=|C) > 0. But, 

aiDC'lC) = h{Zi : t e DC'\Zj : j e C) - log(7re) 

> h{Zi : i G DC'\Zj -.jeC, Hj : j G DC) - jDC^I log(7re) 
= h{W{Dc-}) - \DC'\ ■ log(7re) = 

where W^dc^} denotes the vector composed of the random variables {Wi : i G DC^}. Here, 
(a) follows from the fact that conditioning reduces differential entropy, while (6) follows from 
the whiteness of the Gaussian noise process W . 

Since the term — \A\ log(7re) in the definition of a is linear in \A\, the third property for 
a is equivalent to the same property for the set function A ^ h{Zi : i G A). This equivalent 
form of the third property is given by 

h{Zi : I G AB^'lZj : ] eB)< h{Zi : i G AB^'lZj : j G AB) 

But this is the well known property that conditioning on less information increases differential 
entropy. This proves the third property. The fourth part of the proposition follows from the 
stationarity of the random process H. ■ 

The only properties of a that are used in what follows are the properties listed in the 
above lemma; i.e., in what follows, a could well be substituted with another functional P, as 
long as P satisfies the properties in Lemma \6A\ 

Lemma 6.2: Let A, B C Z he finite disjoint nonempty sets. Then 

ajAUB) ^ ajA) ajBlA) ^ ajAU B) ajBlA) ^ ajA) 

\AUB\ - \A\ \B\ - \AUB\ \B\ " \A\ ^ ^ 

Proof: Trivially, 

a{A) + a{B\A) _ a{AU B) 
\A\ + \B\ ~ \AUB\ 

Each individual term of the numerators and denominators in the above equation is nonnegative. 

Note that for a,b > and c,d>0: 

c + d c d c + d d c 

< - <^ - < <^ - < - 

a + b a b a -\-b b a 

Letting a = \A\,b = \B\,c = a{A), d = a{B\A), the lemma follows. ■ 
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Let E, F be two nonempty subsets of Z with finite cardinality. E is defined to be better 
than F if 

a{E) ^ a{F) 
\E\ - \F\ 



Lemma 6.3: Let A C Z be nonempty with finite cardinality. Suppose < for all 
nonempty proper subsets C of A. Suppose _B is a set of finite cardinality such that < 
So, B is better than A, and A is better than any nonempty subset of A. Then, for any integer k, 

a{AUB) ^ a{A) 
\AUB\ ~ \A\ 

where A = A + k, i.e., A is obtained by incrementing every element of A by k. 

Proof: It suffices to prove the result for A = A, for otherwise B can be suitably translated. 
Let D = BA^ and D = BA. The set B is better than A, and hence better than any subset of 
A. In particular, B is better than D. B is the union of the two disjoint sets D and D. Applying 
LemmaOto D and D yields < Since, ^ < it follows that < 

The fact that D is a subset of A, and the second property of a applied to A and D, together 
imply that "^-^^^ < "^|^|^^ < Consequently, application of Lemma to the disjoint sets 
A and D yields that "j^^^-* < which is equivalent to the desired conclusion. ■ 

Proposition 6.1: The following holds. 

A: A /mite L4 n^oo ri 



Proof: Let e > 0. Then, there exists a finite nonempty set Al with 

,\ y < a* + e where a* = mf -p-/ 

A: A finite \A\ 

Let A* be a smallest cardinality nonempty subset of A^ satisfying the inequality 

a{A* 



\A* 



<a* + e 



Then 

a{A*) a{A) 



, , < for any A C A* with A 7^ 

\A*\ ~ \A\ 

Let 5i = A*. For A; > 1, let 5^ = A* U (A* + 1) U . . . U (A* + A; - 1). For k > 1, let be 
the claim that Sk is better than A*. The claim Ti is trivially true. For the sake of argument by 
induction, suppose Tk is true for some k > 1. Choose B = Sk, and A = A* + k and apply 
Lemma lOl This proves the claim Tk+i. Hence, by induction, Tk is true \f k E N. So, for any 
k e N, A* U {A* + 1) U . . . U {A* + k - 1) is better than A*. Roughly speaking, any gaps 
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in the set Sk are removed with k. So, for every e, we can find an so that for all n > n^. 
An = {1, . . . , n} satisfies: 

a < <a + e\in>n^ 

Hence the proposition is proved. ■ 

Equation (l54l) and Proposition 16. II imply that the capacity per unit energy is given by the 
following limit: 

c,(P) = 1 - ito °"^--"» 

log det(J„ + PS„xn) 
= 1 - hm (56) 

n^oo nr 

At this point, it may be worthwhile to comment on the structure of a signaling scheme for 
achieving Cp{P) for the original channel. The structure of codes achieving capacity per unit 
energy for a memoryless channel with a zero cost symbol is outlined in [1]. This, together with 
Propositions 12.21 and 16. 11 show that Cp{P) can be asymptotically achieved by codes where each 
codeword W has the following structure for constants iV, T, d with iV 3> 1 and 1 -C T <^ d: 

. Codeword length is N{T + d). 

. Wi e {0, y/P} for all < i < N{T + d). 

. Wi is constant over intervals of the form [k{T + d),k(T + d) + T - 1]. 

. Wi is zero over intervals of the form [k{T + d) + T,{k + 1)(T + d) - 1]. 

So, the vast majority of codeword symbols are OFF, with infrequent long bursts of ON symbols. 
This is referred to as temporal ON-OFF signaling. 



C. Identifying the limit 

We shall show that is equivalent to dlTt-lfTSl). 

Let Z be the process defined by (l53t . Consider the problem of estimating the value 
of Z(Qi) by observing the previous n random variables {Z{k) : —n < A; < 0} such that the 
mean square error is minimized. Since Z is a proper complex Gaussian process, the minimum 
mean square error estimate of Z(0) is linear [22, Chapter IV.8 Theorem 2], and it is denoted 

Z{0) - Z{0\-l,...,-nf 



by Z(0| - 1, . . . , -n). Let cr^l_^_ _^ = E Z{0) - Z(0| - 1, . . . , -n) be the mean square 
error. Let D„ denote the determinant det(/„ + PS„). Note that D„ > 1 for all n since S„, being 
an autocorrelation matrix, is positive semidefinite for all n. 
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Lemma 6.4: The minimum mean square error in predicting 2^(0) from {Z{k) : —n < 
/c < 0} is given by 

2 _ Dn+l 

^0|-l,... -n - 

Proof: The random variables = {Z{k) : —n < A; < 0} are jointly proper complex 
Gaussian and have the following expression for differential entropy. 

h{Z\) = log((7re)"+iD„+i) 

The differential entropy of Z{0\ — 1, ... , — n) is the conditional entropy h (^Z {Q)\Z_Zn) , which 
can be expressed in terms of Dn+i and Dn as follows. 

h{z{0\-l,...,-n)) = h{zmZZl) 

hiz\)-h{zzl) 

= log((7re)"+iZ}„+i) - log((7re)"D„) = log ^716^"+' 



where (a) follows from the fact that, for any two random vectors U and V, the conditional entropy 
h(U\V) = h(U, V) — h(y). Since Z(0\ — 1, ... , — n) is a linear combination of proper complex 
jointly Gaussian random variables, it is also proper complex Gaussian. Hence, its differential 
entropy is given by 

h{Z{0\ - 1, ... , ~n)) = log(7re aoVi,...,- J (57) 

The lemma follows by equating the above two expressions for the differential entropy of 

Z(0\-1 -n). 



The ra-step mean square prediction error crQ^_i is non-increasing in n, since projecting 
onto a larger space can only reduce the mean square error. So, the prediction error of Z(0) given 
{Z (—1) , Z (—2) , . . .) is the limit of the sequence of the n-step prediction errors. 

lim cTo|-i,...,-n = t^o|-i,-2... (58) 

It follows from Lemma 16.41 that the ratio of the determinants, Dn+i/Dn converges to the 
prediction error of Z{Q) given (Z(— 1), Z{—2), . . .). 



lim -J^ = ^01-1,-2.. 



n— >oo 



The sequence Dn also converges, since converges, and it converges to the same limit as 

/Dn. 

lim D'J^ = aoVi,-2... (59) 
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Let {Fz{uj) : — tt < < tt) be the spectral distribution function of the process Z. 
Returning to the prediction problem, the mean square prediction error crQ|_^ _^ can be expressed 
in terms of the density function of the absolutely continuous component of the power spectral 
measure of the process Z [23, Chapter XIL4 Theorem 4.3]. 

J ^ogF'ziu;)—] (60) 

From (l59l) . we know that the log det term in the capacity per unit energy expression (l56l) 
converges to the log((TQ|_^ _2 )• Equation (l60b relates the mean square prediction error of a wide 
sense stationary process to the spectral measure of the process. This lets us simplify the log det 
term into an integral involving the power spectral density S{uj) of the fading process H. We 
state and prove the following lemma. 

Lemma 6.5: 

I{P) = lim -logdet(J„ + P-E„) 

n— >cxD 77, 

Proof: Let {Fh{uj), Fw{iu) : — vr < a; < tt) be the spectral distribution functions of the 
processes H, and W respectively. 

Fz{uj) =P-Fh{uj) + Fw{uj) 

The density of the absolutely continuous part of the power spectral measure of the 
fading process H is given by S{uj). Since W is white Gaussian, its spectral distribution F^ 
is absolutely continuous with density 1. Hence the density F'^ of the absolutely continuous 
component of Fz is given by 

F'ziio) = 1 + P- S{uj) (61) 

The expression for the mean square prediction error cr'^^_i _2 in (l60b involves the density 
function F'^. Substituting the density function by the expression in dMT) . we get 

J \og{l + p. S{u))^j (62) 

From dTSt and (l62b . it follows that 

^^01-1,-2... = e'(^) (63) 

The lemma follows from equating the expressions for cr^^^i _2 (l59b and (l63t . ■ 

Let I{P) be the mutual information rate between the fading process H and Z, when the 
input is identically -\/P, as modeled in (l53t . It is interesting to note that I{P) is related to I{P) 
in the following manner. 

T{P) = lim -/(Z_i...Z_„;i/_i...if_„) (64) 

n^oo n 

= lim -h{Z^i . . . Z_„) - -h{W-i . . . W^n) (65) 
n^oo n n 
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The first term in (l65t is the entropy rate hz of the process Z . Following (l57t . (l58t and (l62b . /i^ 
is given by 

/iz = log(7re) + /(P) (66) 

The second term in (1^ is the entropy rate of the white Gaussian process W , given by log(7re). 
From (l65t and (l66b . it follows that the mutual information rate /(-P) is equal to /(-P). 

We briefly outline an alternative way to prove Lemma 16.51 in Appendix |TV| Additional 
material on the limiting distribution of eigenvalues of Toeplitz matrices can be found in [24, 
Section 8.5]. Lemma l631 is used to simplify the capacity expression in (l56b . Using the above 
simplification, the capacity per unit energy is given by 

CAP) = 1 - ^ (67) 

This proves Proposition 13.11 



VIL EXTENSION TO CHANNELS WITH SIDE INFORMATION: Proof of 

Proposition 112J 



Considering CSI at the receiver as part of the output, the channel output can be represented 



by 



Y{k) 



(68) 



X{k)H{k) + W{k) 
H{k) 

Since H and W are stationary and weakly mixing, and the processes H, W and X are mutually 
independent, it can be shown that the above channel is stationary and ergodic. Propositions 12. II 
and 12.21 can be extended to hold for the above channel. Recall that C^°^\P) denotes the capacity 
per unit energy of this channel under a peak constraint P. 

Let H denote a fading process where blocks of length T, {H{l+kT), . . . , H(T+kT)), in- 
dexed ink E Z, are independent, and each block has the same distribution as {H(l), H{2), . . . , H{T)). 
A channel with the above fading process and with CSI at the receiver can be represented by 

X(l + kT)H{l + kT) + W{1 + kT) 



Y{k) 



X{T + kT)H{T + kT) + W{T + kT) 
H{1 + kT) 

H{T + kT) 



(69) 
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with input + kT),..., X(T + kT)) and output Y{k). Let C^^xiP) denote the capacity per 
unit energy of this channel. Using a simple extension of (l46b in Section IV-Bl it can be shown 
that 

C'°\P) = lim C™^(P) (70) 

T^oo 

Lemma 7.1: For each P > and T > 0, 

C™^(P) = 1 (71) 
For a proof of Lemma 17.11 see Appendix IVl Proposition 13.21 follows from dTOb and Lemma ITT] 

VIIL EXTENSION TO CONTINUOUS TIME: PROOF OF PROPOSITION 1331 

The proof of Proposition 13.31 is organized as follows. The capacity per unit energy with 
peak constraint of the given continuous-time channel is shown to be the limit of that of a 
discrete-time channel, suitably constructed from the original continuous-time channel. A similar 
approach is used in [25] in the context of direct detection photon channels. The limit is then 
evaluated to complete the proof. 

Recall that the observed signal d^Ut is given by 

Y{t) = H{t)X{t) + W{t), 0<t<T, 

where X(t) is the input signal. Here, W{t) is a complex proper Gaussian white noise process. 
The fading process H{t) is a stationary proper complex Gaussian process. The observed integral 
process (l2Tl) is then 

V{t)= [ H{s)X{s)ds + ri{t), <t <T, 
Jo 

where r] is a standard proper complex Wiener process with autocorrelation function E['r]{s)r]{t)] = 
min{s, t}. 

For an integer J > 1, a codeword X is said to be in class J if X is constant on intervals 
of the form (i2~'^, (i + 1)2^"']. A codeword is said to be a finite class codeword if it is in class 
J for some finite J. Note that a class J codeword is also a class J' codeword for any J' > J. 
Given an integer K > 1 a decoder is said to be in class K if it makes its decisions based only 
on the observations = {Y^{i) : i > 0), where 

.(i+l)2-^ 

= / Y{t)dt = V{{t + 1)2-^) - r(22-^). (72) 

Note that a class K coder is also a class K' coder for any K' > K. Let Cp'^{P) denote the 
capacity per unit energy with peak constraint P when only class J codewords and class K 
decoders are permitted to be used. 
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Observe that, taking J = K, if a code consists of class K codewords and if a class 
K decoder is used, then the communication system is equivalent to a discrete time system. 
Therefore, it is possible to identify Cp'^{P) using Proposition I3.1[ 

Note that Cp{P) > Cp'^^{P) for any finite J and K, because imposing restrictions on the 
codewords and decoder cannot increase capacity. For the same reason, Cp'^{P) is non-decreasing 
in J and in K. Letting J = K and taking the limit i^' ^ oo yields 

CpiP) > lim ^(P). (73) 

K—*oo 

The proof is completed by showing that Cp{P) = limx^oo Cjf '^(-P), and then identifying the 
limit on the right as the expression for capacity per unit energy given in the proposition. 

Lemma 8.1: Cp{P) = sup^,^.[o,oo) ■ 

Proof: The continuous-time channel is equivalent to a discrete-time abstract alphabet 
channel with input alphabet L^[0, 1] and output alphabet Co[0, 1], the space of complex- valued 
continuous functions on the interval [0, 1] with initial value zero. For convenience, let T be a 
positive integer. Then an input signal {X (t) : < t < T) is equivalent to the discrete-time signal 
{Xq, . . . , Xt-i), where Xi are functions on L^[0, 1] defined by Xi(s) = {X(s + i) : < s < 1). 
Similarly the output signal {V{t) : < t < T) is equivalent to the discrete-time signal 
{Vo,...,Vt-i), where Vi{s) = {V{s + i) - Y{i) : < s < 1). Propositions lO and lO 
generalize to this discrete-time channel with the same proofs, yielding the lemma. ■ 

Lemma 8.2: The divergence D(Pv\x\\Pv\o) as a function of X, which maps L^[0,oo) 
to [0,oo), is lower semi-continuous. 

Proof: Let Pv\x,h denote the distribution of V given {X,H). Let Py^x be defined 
similarly. Given (X, H), as shown in (OTT) . V is simply given by the integral of a known signal 
XH plus a standard proper complex Wiener process. Consequently, the well-known Cameron- 
Martin formula for likelihood ratios can be used to find D( Pv\x,h \ \ Pv\x h ) ~ \ — XH\\2. 
The measure Pv\x is obtained from Pv\x,h by integrating out H. Namely, for any Borel set 
A in the space of V, Py^x[A] = Eh[Pv\x,h[A]]. A similar relation holds for Py^x- Also, the 
divergence measure -D( || ) is jointly convex in its arguments. Therefore, by Jensen's inequality, 

Di Py^x \\ Py^x ) < EH[DiPy^x,H\\Pylx,H)] (74) 

= Eh 



E 



H 



\XH -XH\\l 

T 

\X{t)-X{t)\^\H{t)\''dt 







(75) 
(76) 



= 11^-^112 (77) 
The or variational distance between two probability measures is bounded by their divergence: 
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namely, \\P - Q\\i < ^2D{P\\Q) [26, Lemma 16.3.1]. So 

\\Pv\x - Pv\x\\i < V2\\X-X\\2. (78) 

In particular, Pv\x as a function of X is a continuous mapping from the space L^[0, oo) to the 
space of measures with the metric. The proof of the lemma is completed by invoking the fact 
that the divergence function D(P\\Q) is lower semi-continuous in {P,Q) under the metric 
(see theorem of Gelfand, Yaglom, and Perez [14, (2.4.9)]). ■ 

Lemma 8.3: Cp{P) = \imK-.oo C^'^{P) 



Proof: Proposition 12.21 applied to the discrete-time channel that results from the use of 
class K codes and class K decoders yields: 

C ' (P) = sup -J— 2 —. (79) 

X of classic ll^lla 

Lemmas 18.11 and 18.21 and the fact that finite class signals are dense in the space of all square 

integrable signals implies that 



Cp(P)= lim sup ^ , ' ' . (80) 

^^°°x of class K 11^ II2 

Let JF^ denote the a-algebra generated by the entire observation process {Y^{i) : i > 0), or 

equivalently, by {V{2~^i) : i > 0). Then JF^ is increasing in K, and the smallest a-algebra 

containing JF^ for all K is JF^, the cr-algebra generated by the observation process V . Therefore, 

by a property of the divergence measure (see Dobrushin's theorem [14, (2.4.6)]), for any fixed 

signal X, D{Pv\x\\Pv\q) = lim^^oo -D (Pyx |x| |Py a- |o). Applying this observation to (l80b yields 

n (TD\ V D{Pyk\x\\Pyk\q) 

Cp{P) = lim sup -f—^ (81) 

of class /s: 11^112 

Combining (l79t and dSTT) yields the lemma. ■ 

Lemma 8.4: \imK ^00 Cp'^ (P) is given by the formula for Cp{P) in Proposition 13.31 

Proof: Let C be a (T, M, P, e) code for the continuous time channel with class K 
codewords. Let T = n2^^^ for some n E N. Fix a codeword 

n-l 

X{t) = v^^a,M(2^t-z) 

1=0 

where u{t) = /{t6[o,i]}- 

An equivalent discrete time system is constructed using a matched filter at the output, 
followed by a sampler that generates 2^ samples per second, as shown in Figure H The matched 
filter gxit) is given by 

gj^it) = v^M(-t2^) (82) 
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Filter 




H{t) W{t) 



Fig. 7. Matched filter response is sampled at rate 2^ Hz 



Filter 




Fig. 8. The filter response of the channel process H{t) is sampled ( @ 2^ Hz) to generate the discrete time process Hk 



The equivalent system is 

Z{i) = aiHK{i)+W{i) (83) 

Here, the discrete-time process is a proper complex Gaussian process defined as the filter 
response of the channel process H{t), sampled at 2^ Hz, as shown in Figure |5] The noise process 
W is an i.i.d proper complex Gaussian process with zero mean and unit variance. The input 
codeword X{t) in the continuous time system corresponds to an input codeword a = (oi, . . . , a„) 
for the discrete-time system (l83t . 

The codebook C corresponds to an [n, M, z/, P2^^ , e) code for the channel H^. Thus, 
Cp'^{P) is the capacity per unit energy Cp{P2~^) of the discrete-time channel process Hk 
with peak constraint P2^^ . 

It is easy to see that the spectral density {5'a'(u;) : — vr < u; < tt} of the process Hk is 
given by: 

oo 

Sk{uj)=2^ J] S {2^{uj -2Tin))%mc^{uj -21:11) (84) 

n=~oo 

where 

. sin(^/2) 
smc(..)=-^. 

Let hK = Rhj^{^) where R^^ is the autocorrelation function of the process Hk- 
Claim 8.1: lim^^oo exists and equals 1. 
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Proof: Clearly = J^^ SK{to)^. From dH, it follows that 

5(..)smc ^- 



a;=— oo 



Claim lOI follows from the Dominated Convergence Theorem, noting that, for any u, 
limi^_^oo 'S'(co')sinc^^ = S{uj) and sinc^^ < 1. ■ 

Let Ik be defined as follows. 

Ik = 2^ r log (l + P2-^Sk{u;)) ^ (85) 



lim '^(P) = 1 - i lim (86) 



By Claim O it follows that 



Claim 8.2: lim^^^, = log (1 + P5(cu)) 

Proof: Substituting for Sk (EH) in the expression for Ik in (l85t yields 

uj ,\ duo 



log 1 + P 5^ 5(c 

■'^a^' V n=-oo 



= / log I i + 7 , - 27rn2^)sinc2(^ - 2?™) J — 



Fatou's Lemma yields the following lower bound. 

liminf/x> / \og{l + PS (uj))^ (87) 
^>o L=-oo 27r 

The following upper bound on Ik follows from the fact that for any xi > 0, X2 > 0, log(l + 

Xi + X2) < log(l + Xi) + X2, and sinc^(a;) < 1 for each x: 



Ik < 



< 

It follows that 



r log (1 + P^(..)sinc^(^)) p+ [ S{u)sinc\^)l^ 

^^og(l + PSM)|^^-/ Sm|^ 

27r y|(^|>^2^f 27r 



limsupJ^< f log(l + PS(cj))— (88) 
x>o J-oo 27r 

From dSTl and (l88t . lim/^^00 -^x exists and Claim lOl is proved. ■ 
Claim 18.21 and (15^ complete the proof of Lemma 18.41 



The validity of Proposition 13.31 is implied by Lemmas 18.31 and 18.41 The proof of Propo- 
sition |3]3l is complete. 
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IX. CONCLUSION 

This paper provides a simple expression for the capacity per unit energy of a discrete-time 
Rayleigh fading channel with a hard peak constraint on the input signal. The fading process is 
stationary and can be correlated in time. There is no channel state information at the transmitter 
or the receiver. The capacity per unit energy for the non-coherent channel is shown to be that 
of the channel with coherence minus a penalty term corresponding to the rate of learning the 
channel at the output. Further, ON-OFF signaling is found to be sufficient for achieving the 
capacity per unit energy. Similar results are obtained for continuous-time channels also. One 
application for capacity per unit energy is to bound from above the capacity per unit time. 
Upper bounds to capacity per unit time are plotted for channels with Gauss Markov fading. 

A possible extension of this paper is to a multiple antenna (MIMO) scenario. While the 
results may extend in a straightforward fashion to parallel independent channels, extension to 
more general MIMO channels seems non-trivial. Also, the fading could be correlated both in 
time and across antennas. Suitable models of fading channels that abstract such correlation need 
to be constructed. Another possible extension of this paper is to consider more general fading 
models such as the WSSUS fading model used in [4,5]. This would let us explore the effect of 
multipath or inter-symbol interference on capacity in the low SNR regime. 

Appendix I 

BOUNDING CAPACITY PER UNIT ENERGY USING FOURTHEGY 

We bound the capacity per unit energy for the channel in ([T]) by applying a bound of 
Medard and Gallager [4]. In the terminology of [5], this amounts to bounding the fourthegy 
using the given average and peak power constraint, and using the expression for capacity per 
unit fourthegy. 

Let H denote the block fading process such that blocks of length T are independent, 
with each block having the same probability distribution as {H{1), H(2), . . . , H(T)). Denote T 
consecutive uses of a channel with fading process H by the following: 

Ytxi = HtxtXtxI + WtxI 

E[\X{i)\^] < Pa^g 

|x(z)p < PViG {i...r} 

Here, Htxt is a diagonal matrix with entries along the main diagonal corresponding to {H(l),..., 
H(T)). The average and peak power constraints are specified by ( l90b and dOTT ). According to a 
bound of Medard and Gallager [4] (also see [5, Prop. II. 1]): 

I{Ytxi;Xtxi) < E[JciXTxi)l (92) 



(89) 
(90) 
(91) 
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where Jc{Xtxi) is the fourthegy of Ytxi corresponding to input X^xi- Normalizing with respect 
to the additive noise power, is set to 1. Let {Ytxi, Xtxi) denote T consecutive uses of 
the channel modeled in ([l]). Since (Ij'xi; -^txi) and (Ytxi, Xtxi) are statistically identical, 
H^Txi'-i ^Txi) = -^(^Txi; -^rxi) and the fourthegy of Y^xi is also given by Jc(Xj-xi)- The 
average fourthegy is upper-bounded in the following manner: 

T / T 

i=i \j=i 

T / T 

i=i \j=i 

The above inequality follows from the peak power constraint dgTT) . We further upper-bound the 
above expression and apply Parseval's theorem to obtain 

T oo 
j=l j=-oo 

This yields the following upper-bound on fourthegy: 



T 

MXtxi) < V ■ P r S'ico) ^ (93) 



Combining (EH) and yields 

I(Ytxi', Xtxi) 



E 



< Up (94) 



where Up is given in Equation (fT9l) . From (|94|) and (fT4l) . it follows that Cp < Up 



Appendix II 
PROOF OF COROLLARY O 

The fading process H is Gauss Markov with autocorrelation function p'*' for some p 
with < p < L By Proposition 13.11 the capacity per unit energy for peak constraint P is 
given by (l67t . The expression -p j^^ log(l + PS{lo)) ^ is now evaluated for the Gauss Markov 
fading process. The autocorrelation function Rh of the Gauss Markov fading process is given 
by Rnin) = pl"L Its z-transform, S{z) is given by 

1 -p2 

Siz) = — for \z\ < 1. 

[1 - pz){l - pz-^) 



36 



Note that 1 + PS{z) is a rational function with both numerator and denominator having degree 
two. Zeros of the function 1 + PS{z) satisfy 

p^z^ +pz{l + P + p\l-P)} + p^ = 0. (95) 

Recall that z^ is the larger root of the equation 

z^ + z{l + P + p'^{l-P)} + p^ = 0. (96) 

Comparing the two equations, it follows that ^ is a zero of 1 + PS{z). Since Ruin) is even, 
S{z) = S{z~^). So, the other zero of 1 + PS{z) is (This is also evident since the product 
of the roots of ( 1951 is 1.) It follows that 1 + PS{z) can be written as 

-PZ + {1 + P + p2(l - P)} - pz-^ 



1 + PS{z) 



(l-pz){l-pz-^) 
i-l/pz) {p\z-^){z-^) 



Consider the terms in the numerator of the above expression. Since {—l/pz){z—^) = {1/ z^){z^^ — 



^), 1 + PS{z) can be further simplified as 



(^_£±)(^-i_£±) 
z+iz--){z 1--) 



Hence for = 1, 



1 + PS{z) = \f{z)\ where f{z) = \ (98) 

Since the polynomial in ( 1951 is negative dX z = 1 and positive as — > oo, it is clear that 
Zj^ > 1. The function / is analytic and nonzero in a neighborhood of the unit disk. Thus, by 
Jensen's formula of complex analysis, 

log(l + P5(^))^ = £log|/(eni^ 

= logV(0)| 
= log2+. 

Equation (l27t in Corollary 14. II follows. 

The integral in the expression for Up, as given in ( [T91) is simplified using Parseval's 
theorem as follows: 

^ „2|n| 1 + 



Equation (l28t in Corollary 14.11 follows. 
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C,^(P,T) = sup (99) 



Appendix III 
PROOF FOR COROLLARY 01 

The proof works for both discrete-time and continuous-time channels. Let T be the input 
alphabet (F = for a discrete-time channel). Since the block fading channel is a discrete 
memoryless vector channel, Verdu's formulation [1] of capacity per unit cost applies here. 

D{Py\x\\Py \o) 

^ II V\\2 

X&T: X^O 11-^ lb 

where X is understood to satisfy the peak power constraint ||X||oo < VP- Following [5, p. 812] 
(discrete-time) and [5, Prop. III.2, (16)] (continuous-time), D{py\x\\py\o) can be expressed as 
^ . 0(Aj), where 0(A) = A — log(l + A) and {Aj} is the set of eigenvalues of the autocorrelation 
matrix (discrete-time) or autocorrelation function (continuous -time) of the signal HX. This signal 
has rank one. So, Ai = ||X||2, and Aj = for i 7^ 1. Thus, 

D{pY\x\\PY\o) = \\X\\l - log(l + \\X\\l). (100) 
So, the expression for capacity per unit energy with peak constraint P simplifies to: 

C^iP,T) = l- inf ^"g^^ + ifll'^ (101) 

Since is monotonic decreasing in a; for x > 0, the above infimum is achieved when 

\\X\\2 is set at its maximum allowed value PT. This completes the proof of Corollary 14.31 

Appendix IV 
ALTERNATIVE PROOF FOR LEMMA O 



Lemma 1631 shows that, in the limit as n — >■ 00, the expression 

-logdet(/„ + P- S„) 

n 

can be expressed as an integral involving S{uj), the density of the absolutely continuous part of 
the spectral measure of the fading process H. Here, we present a brief outline on an alternative 
proof for the same. 

The term det(/ + PS„) can be expanded as a product of its eigenvalues. 

logdet(/ + PE„) 1 



-^log(l + PA,) 



n 

i=l 



(102) 



nP P 

Here, Aj is the i^^ eigenvalue of S„xn- The theory of circulant matrices is now applied to evaluate 
this limit as an integral: 

lim - V log(l + PA,) = / log(l + PS{uj)) (103) 
n^oo n ^—^ J _^ Itx 
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The result about convergence of the log determinants of Toeplitz matrices is known as Szego's 
first limit theorem and was established by Szego [27]. Later it came to be used in the theory of 
linear prediction of random processes (see for example [22, § IV.9 Theorem 4]). Therefore, 

hm = - / log 1 + PS{u)) —. (104) 



Appendix V 
PROOF OF LEMMA O 



Since the channel modeled in (l69b is a discrete-time memoryless vector channel, the 
formulation of capacity per unit cost in [1] can be applied. 

C-^iP)= sup ^^^[gllfl"^ (105) 

Here, X is a deterministic complex vector in C^. Let denote the covariance matrix of Y 
conditional on X being transmitted, and So the covariance matrix of Y conditional on being 
transmitted. Let X denote diag(X), and S denote the covariance matrix of the random vector 
{H{1), . . . , H{T)Y . The following expressions for Sx and So are immediate. 

xsxt + /r xs 
sxt s 

It 


The divergence expression in (11051) then simplifies to 

det Sx 



(106) 
(107) 



D{Py\x\\PY\^) = log^^ + tr(E,^,jSo^Frt_s^iyyt] 



(108) 



log + tr (So ^Sx - hr) (109) 
log^i|^ + trl ^ 1 (110) 




det So 

It is clear that det So = det S. To evaluate det Sx, let Sx be given by 

i?e(Sx) -/m(Sx 



Im{J:x) i?e(S 



X, 
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Clearly, det Sx = (detSx) ■ Since Sx is a matrix with real entries, row-operations leave the 
determinant unchanged yielding the following expression. 



detE 



X 



So, det Ex 




This implies that log ^^^^^o 



^^^^ — 0. Since the correlation matrix S is normalized to have ones on 

T 



the main diagonal, tr{XT,X^) = J2i=i l^jP- So, the divergence expression in (II 101) evaluates 
to 1 independent of the choice of the deterministic complex vector X (as long as X 7^ 0). This, 
along with (I105I) . proves dTTT) . 
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